A New Method for Extracting Key Terms from Micro-Blogs Messages Using Wikipedia
نویسنده
چکیده
This study describes how to extract key terms of the micro-blogs messages, using information obtained by analysing the structure and content of online encyclopaedia Wikipedia. The algorithm used for this target is based on the calculation of "keyphraseness" for each term, i.e., assess the probability that it may be chosen as a key term in the text. During assessment, the developed algorithm has shown satisfactory results in terms of this task, significantly outpacing other existing algorithms. As a demonstration of the possible application of the developed algorithm it has been implemented in a system prototype of contextual advertisement. And some options have been also formulated using the information obtained by analysing Twitter messages, for various support services.
منابع مشابه
A Hybrid Method for Extracting Key Terms of Text Documents
key terms are important terms in the document, which can give high-level description of contents for the reader. Extracting key terms is a basic step for many problems in natural language processing, such as document classification, clustering documents, text summarization and output the general subject of the document. This article proposed a new method for extracting key terms from text docum...
متن کاملWikipedia Article Content Based Query Expansion in IR4QA System
This paper describes the work of our WUST group in NTCIR-8 on the subtask of English to Simplified Chinese and Simplified Chinese to Simplified Chinese information retrieval for question answering (EN-CS and CS-CS IR4QA). In order to enhance the precision and efficiency in question analysis, we employ a special question analysis method extracting more appropriate key terms and apply the query e...
متن کاملHarvesting Domain-Specific Terms using Wikipedia
We present a simple but effective method of automatically extracting domain-specific terms using Wikipedia as training data (i.e. self-supervised learning). Our first goal is to show, using human judgments, that Wikipedia categories are domainspecific and thus can replace manually annotated terms. Second, we show that identifying such terms using harvested Wikipedia categories and entities as s...
متن کاملExtracting Trust from Domain Analysis: A Case Study on the Wikipedia Project
The problem of identifying trustworthy information on the World Wide Web is becoming increasingly acute as new tools such as wikis and blogs simplify and democratize publications. Wikipedia is the most extraordinary example of this phenomenon and, although a few mechanisms have been put in place to improve contributions quality, trust in Wikipedia content quality has been seriously questioned. ...
متن کاملExtracting location and creator-related information from Wikipedia-based information-rich taxonomy for ConceptNet expansion
Our research goal is to generate new assertions suitable for introduction to the Japanese part of the ConceptNet common sense knowledge ontology. In this paper we present a method for extracting IsA assertions (hyponymy relations), AtLocation assertions (informing of the location of an object or place), LocatedNear assertions (informing of neighboring locations) and CreatedBy assertions (inform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013